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Abstract 


Suppose  that  k  new  treatments  have  been  developed  with  the  purpose  of 
replacing  the  standard  treatment  with  the  best  new  one,  provided  that  it  is 
actually  an  improvement  on  the  standard  treatment.  In  a  parametric  approach, 
mainly  under  the  assumption  of(Mtf$,  procedures  are  considered  which,  at  a 


first  stage,  screen  out  inferior  treatments  through  statistical  tests  at  a 
common  level  of  significance^^  If  none  (exactly  one)  is  not  eliminated, 
none  (this  one)  will  be  used  as  a  replacement.  Otherwise,  if  more  than 
one  treatment  overcomes  this  screening  process,  that  one  of  the  non-el imina ted 
treatments  will  be  chosen  as  the  replacement  which  is  judged  to  be  the  best, 
after  additional  data  have  been  observed  from  the  selected  treatments.  Topics 
of  this  paper  are  the  questions  of  how  to  choose  the  terminal  decision  at 
the  second  stage  and  the  tests  at  the  first  stage,  respectively,  and  how  to 
implement  the  appropriate  procedures  at  certain  pre-specified  performance 
criteria. 


Key  Words:  Multiple  comparisons  with  a  control;  2-stage  procedures;  screening 
procedures. 


1.  Introduction  The  following  procedure,  which  is  used  in  certain  clini¬ 
cal  studies,  may  serve  as  a  motivation  for  the  considerations  in  this  paper. 
Suppose  that  k  new  treatments  have  been  developed  with  the  purpose  of 
replacing  the  standard  treatment  with  the  best  new  one,  provided  that  it  is 
actually  an  improvement  on  the  standard  treatment.  In  a  pilot  study, 
each  new  treatment  is  applied  several  times  and  screened  out  if  it  is  not 
considered  to  be  significatly  better  than  the  standard  treatment.  Hereby, 
judgement  is  gained  through  suitable  statistical  tests  at  a  fixed  level  of 

ctQ.  If  all  k  new  treatments  are  eliminated  the  standard  treatment  will 
not  be  replaced.  If  exactly  one  new  treatment  is  not  eliminated  this  will 
be  taken  as  a  replacement.  In  all  other  cases,  the  remaining  treatments 
are  further  examined  in  a  follow  up  study  through  additional  applications, 
and  finally  that  one  which  appears  to  be  the  best  will  be  used  as  a  replace¬ 
ment  of  the  standard  treatment.  The  natural  questions  of  how  to  choose  the 
tests  in  the  first  stage  and  the  terminal  decision  in  the  second  stage  are 
the  topic  of  this  paper. 

Let  be  k  populations  associated  with  unknown  parameters 

®1  ’  *  ‘ "  *®k  Let  ke  a  contro^  va^ue  which  may  be  known  or  unknown. 

In  the  latter  case,  assume  that  there  is  also  a  control  population  ^g.  A 
population  ^  is  considered  to  be  better  than  itq  if  >  eQ,  i=l ,. . .  ,k.  The 
goal  is  to  determine,  in  two  stages,  whether  there  is  any  population  better 
than  the  control  and,  in  the  affirmative,  which  one  is  associated  with  the 
largest  parameter.  Assume  that  samples  X.  *  ( X . . } . _ ,  „  ,  i=0,l,...,k, 

—  I  1J  J"~  I  •  •  •  •  *iH 


and  Y.  =  (Y.  ,  ,  i  =  l,...,k,  can  be  drawn  from  *n,n, ,. . . at 

J  1 J  J ”  •  *  •  •  •  U  I  K 
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the  first  and  at  the  second  stage,  respectively,  which  are  mutually  inde¬ 
pendent.  Let  {fQ}0tn  be  a  given  family  of  densities  with  respect  to  y, 
the  Lebesgue  measure  on  IR  or  the  counting  measure  on  any  lattice  in  1R  , 
and  assume  that  for  every  ifc  {0,1,..., k}  all  observations  from  have  a 

common  distribution  with  density  f  .  Later  on,  after  Theorem  1  has  been 

ei 

proved,  we  will  make  the  additional  assumption  that  for  every  sample  Z  of 
size  n  from  one  population  there  exists  a  sufficient  statistic  Tn(Z:)  such 
that  the  family  of  joint  densities  has  nondecreasing  likelihood  ratios 


in  T  .  For  notational  convenience  let  them  be  in  the  following  denoted  by 
n 


■‘•■V  T,tt|).  -  *,  -  1’1 . “• 


To  simplify  the  presentation  the  case  of  a  known  control  value  eg  will 
be  considered  first.  Before  we  define  a  natural  class  of  two-stage  pro¬ 


cedures  in  a  concise  way,  let  us  briefly  describe  how  these  procedures  will 


be  typically  applied.  For  every  testing  problem  versus  K,. :  > eQ 


the  experimenter  chooses  a  test  based  on  with  a  fixed  level  aQ  and  another 

test  based  on  (X^-.Y^)  with  a  variable  level  a  .  At  Stage  1  he  discards  all 
populations  which  are  not  significant  at  level  aQ  under  the  first  set  of 
tests.  If  none (exactly  one)  is  left,  he  decides  that  none  (this  one)  is 
better  than  the  control  and  is  the  best  population.  If  more  than  one  popu¬ 
lation  survives  he  proceeds  to  Stage  2.  At  Stage  2,  he  draws  additional 
samples  Y.  from  those  populations  which  have  been  selected  at  Stage  1  and 
makes  a  final  decision  in  favor  of  that  population  among  the  selected  ones 
which  has  the  smallest  p-value  (i.e.  is  most  significant)  under  the  asso¬ 
ciated  second  test. 


If  these  tests  are  upper  level  tests,  which  for  simplicity  may  be  non- 
randomized  for  a  moment  to  fix  ideas,  based  on  some  real-valued  statistics 
U.  and  W.. ,  say,  i  =  l,...,k,  then  the  procedure  considered  above  can  be 
equivalently  described  as  follows:  At  Stage  1  all  's  are  selected  with 
Ui  >  c .  (where  c.  is  the  c^-fractile  of  0.  under  q.  =  eQ),  and  a  final 
decision  is  made  in  terms  of  the  largest  ii..  among  the  selected  uj's.  The 
truncated  versions  of  such  procedures  (i.e.  which  perform  Stage  1  only) 
have  been  studied  by  several  authors,  see,  for  example,  Gupta  and  Sobel  (1958) 
and  Lehmann  (1961).  For  further  references  see  Gupta  and  Panchapakesan 
(1979)  Chapter  20.  Some  preliminary  results  concerning  two-stage  procedures 
of  the  type  described  above  in  the  case  of  n^  =  ...  =  nfc  and  m^  =  ...  =  m^ 
can  be  found  in  Gupta  and  Miescke  (1982),  which  include  a  comparison  with 
the  one-stage  analog  by  Bechhofer  and  Turnbull  (1978). 

To  begin  with,  let  us  point  out  that  several  definitions  given  in  Miescke 
(1979)  will  be  relevant  in  the  sequel  but  for  brevity  are  not  repeated  here. 
Especially,  tests  may  be  randomized  ones  taking  values  in  [0,1].  This  typi¬ 
cally  occurs  in  discrete  cases  or  in  continuous  type  cases  where  nonparametric 
(rank)  tests  are  under  concern.  Thus  significance  statements  as  well  as  p- 
values  are  understood  to  be  based  on  additional  randomization  schemes  as  are 
used  in  Miescke  (1979).  To  be  more  specific,  let  A  =  (A^ .... ,Ajc)  and 
B  =  (B^,...,Bk)  be  the  randomization  schemes  for  the  first  and  the  second 
stage,  respectively.  Note  that  the  X. 's,  Y. 's,  A  and  B  altogether  are 

*  J 

assumed  to  be  mutually  independent. 

The  class  of  two-stage  procedures.  For  i  =  l,...,k,  let  cp^  =  {<p^  a^at[o  1] 
be  a  right  continuous  and  monotone  (in  a)  unbiased  test  for  Hi  versus  K.  which 


is  standardized  at  e..  =  6g.  Assume  that  =  1  outside  of  the  support  of 

the  distribution  of  X.  at  e.  =  e«.  Let  w  =  (9,  , . . .  »cp.  )  where 

— 1  1  u  -l-a0  1  »ao  K  ,ao 

0  <  aQ  <  1  is  fixed.  Analogously,  let  =  {1^  a}a€[Q  1]  be  such  a  test  for 

H.j  versus  based  on  ,  Y_. ) .  Let  ^  =  (i^  . .  ,<pk) .  Let  &  be  the  class 

of  all  procedures  of  the  following  type  (cp  ,  £): 

0 


Stage  1 


Select  n.  if  p  (X.,A.),  the  p-value  of  X.  under  9.,  is 
1  cp..  11  11 

smaller  than  aQ,  i  =  l,...,k.  If  none  (exactly  one)  of  the 
populations  is  selected,  stop  and  decide  none  (this  one)  is 
better  than  ttq  and  is  the  best  population.  Otherwise  pro¬ 
ceed  to  Stage  2. 


Stage  2:  Among  the  selected  populations  decide  finally  in  favor  of 
that  7t j  which  has  the  smallest  p-value  p^  (Xj , Yj ,Bj)  under 


The  following  result  will  prove  to  be  useful  in  various  aspects,  except 

for  the  important  question  of  how  to  optimize  the  component  £  in  (g^  ,  £). 

0 


Theorem  1.  Let  (5^  ,£)€$.  For  notational  convenience,  let  Ei  *  E0  (^ 

"tXq  i  ’ 

and  F.(a)  =  E0  (tp.  (X.U.  (X.,Y.)),  a€[0,l],  i  =  l,...k,  o€nk.  Then  for 

-  i  1 jQq  1  1 9 a  1  1 

k 

every  non-empty  Dc  {!,..., k}  and  e€n  , 

(1)  P„  {final  decision  falls  into  D} 

1 

=  /  n  Cl-F.(a)]  d(l-  n  [1-F. (a)]) , 

0  j*D  3  i€D  1 

(2)  P  {final  decision  is  in  favor  of  *.  } 

0  1 

1  k 

=  /  H  [1-F. (a)]  dF.(a),  1«l,....k, 

n  -i-l  J  1 


(3)  Pn  {final  decision  is  made  at  Stage  1  in  favor  of  n.) 

D  1 

k 

~  n  [1  -  E.]  E. ,  i  =  1 .. . • »k, 

j=l  J  1 


(4)  P  {final  decision  is  in  favor  of  the  control) 

0 

k 

n  [1  -  E.]. 
j=l  3 


Proof:  It  has  been  shown  in  Miescke  (1979)  [cf.  (2.3)  -  (2.5)  loc.  ci t . ] 


that  the  distribution  function  of  each  p-value  appearing  in  (^  ) 


equals  to  the  power  function  of  the  corresponding  test,  which  is  a  contin¬ 
uous  function  of  <*€[0,1]  at  every  fixed  parameter  point,  and  which  at 
a  =  1  assumes  the  value  one. 

Let  now  D  be  a  non-empty  subset  of  {l,...,k>.  For  j  *  l,...,k,  let 

p*  (X.,Y.,B.)  be  equal  to  p  (X..Y..B.)  if  p  (X.,A.)  <  «,  and  let  it  be 
'J'j  ~J  “J  J  “J  — J  J  <Pj  “J  J  "  0 

equal  to  1  otherwise.  Then  it  is  easy  to  see  that  the  l.h.s.  of  (1)  is 

equal  to 

Pfl  {min  P*  (X.,Y.,Bi)}  <  min  {p*  (X  ,Y.,B.)»  . 

0  i€D  ^n*  1  1  1  jjED  *j  J  J  J 

Since  for  i€{l,...,k)  and  e.€n,  PQ  {p*  (X.  ,Y.  ,B_. )  <  a)  is  equal  to 

I  o  j  I  I  I 

(a)  if  0  <_  a  <  1,  and  is  equal  to  1  if  a  =  1,  (1)  follows  by  standard 

arguments.  (2)  is  a  special  case  of  (1)  which  was  stated  only  because  of 
its  relevance  in  later  applications.  The  verification  of  (3)  and  (4)  is 
straightforward  and  therefore  the  proof  is  omitted. 


Remark  1.  It  will  be  shown  in  Section  2  that  under  the  assumption  of  mono¬ 
tone  likelihood  ratios  (MLR),  every  (m  ,^)€S  is  dominated  by  (g^  ,  ^ * ) 

^o  o 

if  n-j  +  m^  =  ...  =  n^  +  m^,  where  j;*  consists  of  the  uniformly  most  power¬ 
ful  (UMP)  tests.  Hereby  the  results  of  Theorem  1  will  not  be  of  great  help. 
There  is,  however,  a  particular  situation  where  (1)  and  (2)  can  be  used 
for  a  similar  purpose.  Suppose  that  the  data  of  Stage  1  are  not  available 
but  the  information  which  populations  have  been  significant  is  a  hand.  Then 
one  has  to  use  tests  at  Stage  2  which  depend  only  on  the  Y^'s  from  the 
selected  populations.  In  this  case  every  F^ (a)  factorizes  into  the  product 
of  the  two  power  functions  of  cpi  a  and  a,  respectively,  and  therefore 

(1)  and  (2)  are  completely  determined  through  these  power  functions.  For 
example,  if  D+(eJ  =  {i  |  >  0q,  i€  {l,...,k)>  is  not  empty,  then  (1) 

for  D  =  D+(eJ  is  maximized  by  the  procedure  which  uses  the  UMP-tests  at 
both  stages.  This  is  true  even  if  the  n^  +  m^ 's  are  not  assumed  to  be  all 
equal.  Since  these  and  related  results  in  such  a  special  case,  however,  are 
considered  to  be  of  less  statistical  importance  they  will  not  be  discussed 
in  further  detail. 


In  the  case  of  an  unknown  control  parameter  0q  some  obvious  changes 
have  to  be  made.  First  of  all  the  tests  <p.  depend  now  on  (X.  ,)L), 

1  .a  — 1  “tJ 

’  0 

i=l,...,k,  whereas  ^  remains  the  same  as  before.  Let  S'  denote  the  class 


of  two-stage  procedures  of  the  type  (©  ,^)  in  this  case.  The  analog  of 

^o 

Theorem  1  for  S'  can  be  attained  by  replacing  the  right  hand  sides  of 


(1)  -  (4)  by  their  integrals  with  respect  to  the  distribution  of  Xq.  If  not 
explicitly  stated  otherwise,  the  results  to  be  derived  in  the  sequel  for  A 
have  analogous  counterparts  for  S'  which  will  not  be  formulated  or  proved 
for  brevity  because  of  the  close  similarities. 


{i|0j  >  Qq*  be  the  "good"  populations  and  D_(e)  = 

L 

\D+(eJ  be  the  "bad"  ones,  e_€n  .  Also  let  us  partition  the  parameter  space 
fi  into  n_  =  {e_€n  |ei  _<  0Q,  i  =  l,...,k}  and  its  complement  fl*,  say.  A 
procedure  is  said  to  make  a  correct  selection  (CS)  at  if  all  populations 

are  eliminated  at  Stage  1,  and  it  is  said  to  make  a  correct  selection  at 
if  a  final  decision  is  made  in  favor  of  a  population  with  the  largest  e- 
value.  Let  the  goal  be  now  to  find  a  procedure  in  6  which  has  a  large  proba- 
bility  of  a  correct  selection  (PCS)  on  a  .  From  now  on  we  assume  that  the 
family  {fQ}0€fi  has  the  MLR-property  as  specified  in  Section  1.  Then  the 
following  partial  solution  to  our  problem  can  be  given. 


Theorem  2.  Let  ,  £ )  €jB.  If_  n^  +  m^  =  ...  =  n^  +  m^  and  ^  =  ...  = 
then  for  all  eEfl , 

(5)  I  po(0)  t  CS  under  (5^  ,^)}  <  l  Po^0)  {  CS  under  (5^  ,**)}  , 

where  consists  of  the  UMP- tests  for  H..  versus  ,  i  =  1 , . . . ,k, 
which  in  this  case  are  all  identical,  and  where  the  summation  is 
with  respect  to  all  k!  permutations  of  (1 . k).  a(e)  =  (0o(i ) »•  •  •  »0O((< 


Proof:  Only  an  outline  of  the  proof  will  be  given  since  it  follows  by  similar 

decision  theoretic  arguments  as  have  been  used  previously  in  Gupta  and  Miescke 
(1983). 

Under  the  assumptions  stated  above,  the  associated  decision  function  of 

(jEe,  »i)  which  determines  final  selections  at  Stage  2  is  permutation  invariant. 
0 

The  loss  function  which  is  implicitly  employed  is  zero  if  a  correct  selection 


-  8  - 


is  made,  and  is  one,  otherwise.  Its  component  which  is  associated  with 

final  selections  at  Stage  2  is  permutation  invariant  and  favors  selections 

of  populations  with  large  parameters, 
k 

Let  now  e_c  ft+  be  fixed  and,  in  a  Bayes  approach,  assume  that  the  unknown 
parameter  vector  is  random  and  has  a  prior  distribution  which  gives  equal 
mass  1/k!  to  all  permutations  a(e)  of  e_.  Then  the  posterior  distribution 
of  the  parameter  vector,  given  =  w^ ,  i  =  l,...,k,  has  the  decreasing  in 
transposition  (DT)  property.  From  this  fact  and  the  properties  of  the  loss 
function  stated  above  it  follows  that  the  optimal  final  decision  at  Stage  2 
is  the  natural  one  which  is  made  with  respect  to  the  non-el imina ted  popula¬ 
tion  with  the  largest  W. ,  where  ties  are  broken  at  random.  Clearly  this  is 
equivalent  to  selecting  the  non-el imina ted  population  with  the  smallest  p- 
value  under  test  4,*.  The  proof  is  now  completed  by  noting  that  (5)  gives 
a  comparison  of  the  corresponding  Bayes  risks,  where  of  course  at  all 

the  probabilities  of  a  correct  selection  are  the  same  for  both  proce¬ 
dures. 


Corollary  1. 
nl  =  •••  =  nk 


If,  under  the  assumptions  of  Theorem  2,  additionally 

k 

and  cp^  =  ...  =  <pk  is  given  then  for  all  e_€ ft 
*  0  ’a0 


(6)  P0  {CS  under  (c^  ,£)}  <  |>{C5  under  ,£*)}  . 


Proof :  Since  both  procedures  considered  here  are  completely  permutation 
invariant,  and  since  also  the  0-1  loss  function  employed  is  permutation 
invariant,  their  risk  functions  are  synmetric  functions  of  e_€ftk.  Therefore 
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all  summands  on  the  l.h.s.  of  (5)  coincide  and  the  same  holds  for  the  sum¬ 
mands  on  the  r.h.s.  of  (5). 

Remark  2.  The  proof  of  Theorem  2  actually  applies  more  generally  to  the 
following  situation.  No  matter  of  how  the  populations  are  eliminated  at 
Stage  1,  if  n^  +  =  ...  =  n^  +  m^,  and  if  only  permutation  invariant 

final  decision  functions  are  admitted  at  Stage  2,  then  every  Bayes  procedure 
w. r.t.  any  symmetric  prior  employs  the  natural  rule  at  State  2.  Of  course, 
also  from  a  non-Bayesian  point  of  view,  (5)  is  an  intuitively  appealing 
criterion.  It  simply  reflects  the  lack  of  knowledge  of  how  the  sample 
sizes  are  associated  with  the  k  ordered  populations  parameters.  Since 
there  is  not  even  an  approximately  similar  result  available  in  the  case  of 
unequal  n^  +  m. 's,  it  is  strongly  recommended  to  "repair"  the  design  of 
every  experiment  with  unequal  n^ 's  by  choosing  the  itk's  appropriately  to 
get  equal  overall  sample  sizes.  Let  us  assume  from  now  on  that  n^  +  m^  * 

...  =  nk  +  m^  =  N,  say,  holds. 

Actually,  in  various  selection  problems  authors  have  chosen  their 
designs  such  that  the  statistics  on  which  the  natural  final  decision  rule 
is  based  have  joint  distributions  with  the  DT  property.  To  mention  a 
few  relevant  examples,  Bechhofer  (1954),  Bechhofer,  Dunnett  and  Sobel  (1954), 
and  Dudewicz  and  Dalai  (1975)  have  done  so  to  be  able  to  implement  their  pro¬ 
cedures  at  certain  specified  performance  requirements.  It  may  now  be  added 
that  exactly  in  these  designs  the  employed  natural  final  decisions  are  optimal 
in  terms  of  the  risk  or  the  PCS,  respectively,  uniformly  on  all  parameter 
configurations. 

Lack  of  the  DT  property  in  distributions  of  statistics  used  for  final 


nr- 
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decisions,  in  cases  where  this  property  could  be  attained  in  principle, 
should  be  considered  as  pathological  designs.  Even  in  the  simplest  case  of 
a  one-stage  selection  procedure  serious  difficulties  arise,  as  may  be  illu¬ 
strated  by  the  following  problem  which  was  emphasized  by  Bechhofer  (1982). 

Example.  Let  Xj ,. . . ,Tk  be  independent  sample  means  with  unknown  expectations 
e-j,...,ek  and  with  known  but  different  variance  q^,...,qk  generated  from  k 
normal  populations.  For  the  problem  of  selecting  a  population  with  the 
largest  e-value  no  procedure  exists  which  has  a  largest  PCS,  uniformly  in 

I* 

e_t  1R  .  If  k  =  2  the  natural  rule  is  Bayes  w.r.t.  independent  priors  N(0,q1) 
and  N(0,q2),  respectively,  and  it  can  be  seen  to  be  admissible  under  the  0-1 
loss  function.  However,  if  k  >_  3  the  natural  rule  cannot  be  Bayes  with 
respect  to  any  (multivariate)  normal  prior.  The  question  of  whether  it  is 
admissible  is  still  open.  On  the  other  hand,  for  every  normal  prior  with 
expectation  (0,...,0),  for  which  the  posterior  distribution  has 

the  DT  -property  (which  leads  to  a  simple  solution)  the  Bayes  rule  selects 
in  terms  of  the  largest  X../q.,  i  =  l,...,k. 

In  the  remainder  of  this  paper  only  procedures  of  the  type  (g^  ,£*)€£ 

will  be  considered.  This  is  justified  in  view  of  the  assumption  n^  +  m^  - 
...  =  nk  +  mk  =  N  and  of  Theorem  2.  The  following  result,,  which  generalizes 
Theorem  3  of  Gupta  and  Miescke  (1982),  can  be  used  to  find  least  favorable 
parameter  configurations  (LFC)  of  such  procedures  on  suitable  subspaces  of 

nk. 

Theorem  3.  Let.  (g^  ,£*)€£  where  for  every  i  €  {1 . k>  the  power  function 

of  <p,.  a  is  nondecreasing  in  e^.  Then  the  performance  characteristics  con- 


! 
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sidered  in  Theorem  1  have  the  following  monotonicity  properties.  (1 )  is^ 
nondecreasing  in  ,  i€  D  and  nonincreasing  in  6j ,  j  (  D.  (2)  and  (3)  are 
nondecreasing  In  e.  and  nonincreasing  in  e.,  j  f  i.  (4)  is  nonincreasing 

•  J 

in  t * •  •  *@|^* 

Proof:  The  assertions  concerning  (3)  and  (4)  are  obviously  true.  To 

k 

prove  those  concerning  (1)  and  (2)  note  that  for  i€  {1 ,k>  ,  e_€  n  , 
and  a  €  [0,1 ] , 

(7)  Fj(a)  »  E9j  (^(X,))  £e.l*la  (*,.!,) I  P.p/VV  i«0»- 

The  first  factor  on  the  r.h.s.  of  (7)  is  nondecreasing  in  according  to 
the  assumptions  made  above.  The  second  factor  can  be  seen  to  have  the  same 
property  by  applying  Theorem  1  of  Simons  (1980)  which  guarantees  that  for 
every  sample  from  a  MLR-family,  conditionally  on  any  proportion  of  the 
information  in  the  sample  which  one  might  choose  to  extract,  likelihood 
ratios  are  still  stochastically  nondecreasing.  Since  (X. ,Y.)  is  a  non- 
decreasing  function  of  ,  the  proof  is  completed  by  noting  that  A.,  could 
be  ignored  since  the  arguments  apply  to  every  situation  A.  =  a^,  where 
a^  €  [0,1]  is  held  fixed. 


Corollary  2.  Under  the  assumptions  of  Theorem  3,  'let  n1  = 


=  nfc  and 


cp-j  ^  =  * •  •  =  '•’k  a  '  Then  for  every  £€  n K  with  e-|  <_  ...  <_  6k »  the  proba¬ 
bility  of  a  final  decision  in  favor  of  population  is  nondecreasing  in 
1  ,k)  . 


Proof:  Given  the  assumptions  above  it  can  be  seen  from  the  proof  of  Theorem  3 
that  for  e-|  £  ...  £  ek. 


(8)  F-j  (a )  <_  p2(a)  £  ...  £  F^a),  a  €  [0,1]. 

Therefore  the  assertion  follows  from  (2)  by  the  same  technique  which  was 
used  in  (IV)  of  Miescke  (1979). 


Theorem  4.  Let  (cp^  ,  ijj*)€fl  where  tp^  consists  of  consistent  tests.  Then 
for  increasinq  sample  sizes  n.  and  m.  =  N  -  n.,  i  =  l,...,k,  the  probabilit 


of  a  correct  selection  tends  to  one  at  all  and  at  all  e€n 


9i  <  e0*  1  "  1 >• • • 


k  k 

Proof:  if  with  exactly  one  coordinate  greater  than  0g,  or  if  e^€  Q_ 

with  ei  <  6q ,  i  =  l,...,k,  the  assertion  follows  immediately  from  (3)  and 

(4),  respectively.  For  all  other  e_fc  the  probability  that  all  populations 

iri  with  >  eQ  will  not  be  eliminated  at  Stage  1  tends  to  one.  Moreover, 

PQ  (W.  >  W.  for  all  j  t  i)  also  tends  to  one  if  e.  is  the  unique  maximum 

of  e^,...,©^.  This  can  be  seen  as  follows.  Selecting  in  terms  of  the 

largest  W.  is  equivalent  to  selecting  in  terms  of  the  smallest  p-value  under 

tests  ip.  which  are  essentially  the  same  tests  as  ijit  but  now  standardized  at 
J  J 

0...  By  Theorem  2  of  Miescke  (1979)  it  follows  that 

1 

(9)  P.  { W.  >  W.  for  all  j  Ml  =  /  n  [1-Efi  (h  (X.,L))]da, 

0  1  J  0  j^i  0j  J  »a  J  J 


which  now  can  be  seen  to  tend  to  one  if  N  tends  to  infinity.  If,  however, 
all  the  good  populations  are  not  eliminated  at  Stage  1  and  >  W^,  for  all 
j  f  i,  then  a  correct  selection  is  made.  This  completes  the  proof  in  the 
given  parameter  configuration.  The  case  of  more  than  one  best  population 
can  be  treated  similarly. 


Focussing  now  on  the  first  component  g^  in  the  procedures  (g^  ,  i^*)€j&,  the 

o  o 

natural  choice  is  of  course  g£  which  consists  of  the  corresponding  UMP- 

o 

tests  for  H.j  versus  K..  based  on  X  ^ ,  or  more  precisely,  based  on  U- ,i  = 

Even  though  such  a  choice  cannot  be  justified  (not  even  in  the  case  of 

n^  =  ...  =  n^)  by  an  overall  improvement  on  the  PCS,  several  strong  reasons 

can  be  quoted  in  support  of  choosing  (g£  ,^*).  First,  of  course,  all 

o 

results  derived  hitherto  hold  for  this  procedure.  Second,  the  following 
can  be  stated. 


Theorem  5.  Among  all  procedures  in  jG,  (<p*  ,  i|»  *)  maximizes 

-  a  0 

(10)  P  {CS }  at  every  e 

6 

(11)  Pa  f CS  at  Stage  1)  at  every  e  e  jv  with  exactly  one  e.  >  e_, 

w  —  +  l  u 

Is 

(12)  Eq  (number  of  good  populations  selected  at  Stage  1)  at  every 

L 

(13)  E.  (number  of  bad  populations  eliminated  at  Stage  1)  at  every  e€ft  . 

_  — 

Proof:  (10)  follows  from  (4)  and  (11)  follows  from  (3).  All  arguments  are 

standard  and  are  based  on  the  well  known  properties  of  the  power  functions 

of  the  UMP-tests  <pt  ,i  =  l,...,k.  Therefore,  no  further  details  will  be 

i,aQ 

given. 

Third,  all  permutation  invariant  procedures  (<p  »£*),  except  (g£  ,£*) 
ifself,  can  be  modified,  without  changing  the  sizes  of  the  selected  subsets 
of  populations  at  Stage  1,  in  a  specific  way  which  leads  to  an  improvement 
on  the  PCS,  provided  that  the  family  {f Q >  6  €  ^  is  a  strongly  unimodal 
exponential  family.  It  should  be  noted  that  the  modified  procedure  is  also 
based  on  tests  but  is  no  longer  a  member  of  the  class  jG.  More  precisely, 
from  Corollary  2  in  Gupta  and  Miescke  (1983)  the  following  result  can  be  derived. 
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Theorem  6.  Let  (f0}et!J  be  a  strongly  unimodal  (i.e.  log-concave)  expo¬ 
nential  family.  Then  every  (m  ,  €£  with  n,  =  ...  =  n.  and  <p.  =. .  .=  tp. 

K  *ao  *  ,ao 

can  be  Improved  by  simply  replacing  the  selected  populations  at  Stage  1  by 
the  same  number  of  populations,  but  now  by  those  which  are  associated  with  the 
largest  U^'s,  where  ties  are  broken  at  random.  Then  for  all  e_€n  , 

(14)  P  tCS  under  (g>  ,0)*)}  <  PQ  {CS  under  the  modified  procedure}  . 

£  -  £ 

3.  Applications  With  Illustrations  In  The  Normal  Case.  In  applications  the 

procedure  (<$£  ,  ^*)  usually  will  be  implemented  as  to  meet  certain  perfor- 
o 

mance  requirements.  This  will  be  described  in  this  section  and  will  be  illu- 
strated  by  the  example  of  k  normal  populations  ,  oj).  i  =  l,...,k,  where 
U.j  and  Wi  are  the  corresponding  sample  means,  i  =  l,...,k.  Here  the  procedure 
can  be  considered  to  be  the  two-stage  analog  of  the  one-stage  procedure  by 
Bechhofer  and  Turnbull  (1978).  At  first  consider  the  basic  requirement 

(15)  inf  {  PQ  {CS  under  ($£  »l*)|£€fik  }  =  Pg  , 

where  Pg  is  a  predetermined  constant.  In  view  of  (4)  this  can  be  accomplished 
by  choosing  aQ  to  satisfy 

(16)  (1  -  aQ)k  =  P*. 

Then  in  the  normal  case  the  procedure  is  of  the  following  form.  At  Stage  1, 
population  u.  is  selected  if  n1/2  (U..  -  eQ)  /  o.  >  $-1(l  -  aQ),  i  =  l,...,k, 

where  $  denotes  the  c.d.f.  of  N(0,1).  And  at  Stage  2,  a  final  decision  is 
made  in  terms  of  the  largest  W.  from  the  selected  populations. 

J 

Since  (15)  actually  involves  only  the  properties  of  the  procedure  at  the 


first  stage,  the  Pq*  -  condition  can  be  attained  by  employing  techniques 
used  for  one-stage  procedures.  Thus  (15)  can  be  solved  by  taking  recourse 
to  relevant  papers  in  this  area.  For  further  details  see  Gupta  and 
Panchapakesan  (1979). 

A  second  requirement  will  typically  employ  the  indifference  zone 

approach  which  is  due  to  Bechhofer  (1954).  Let  A  >  0  be  fixed  and  let 
k  k 

ft*  =  { £  fcft  |  £.  +  A  >  9g,  6-| , . . . ,  e^-i ,  ei+1 ....  0k  for  some  i }.  Now 

consider  the  requirement 


(16) 


inf  {PQ  { CS  under  (cf£  ,t*)>  |  £€  n^}  =  P| 


where  P*  is  a  second  predetermined  constant.  Even  though  Theorem  3  can 
be  used  to  find  the  LFC,  it  is  technically  too  difficult  to  attain  (16) 
exactly.  Therefore,  the  following  conservative  approach,  which  over¬ 
protects  the  experimenter  with  respect  to  (16),  is  recommended  and  is  easy 

k  • 

to  perform.  Let  £€ftA  with  ei  *  max  (e^...,  ©k  }  ,  say.  If  population 

ttj  is  selected  at  Stage  1  and  W.  is  the  unique  maximum  of  W1 , . . .  ,Wk  then  a 

correct  selection  is  made.  Therefore  if  the  following  two  conditions  are 

fulfilled,  and  b-j  and  are  chosen  to  meet  B^  +  ~  1  =  P*  then 

by  Bonferroni's  inequality  it  follows  that  the  l.h.s.  of  (16)  is  not 

smaller  than  P^.  The  conditions  are 


(17)  E  (<p*  &» 

V  J  ,ao 


A-,,  >  0, 


Bit  J  *  It... ,k,  and 


08) 


inf  {  P 


( 9+A ,0 , . . . ,6 )  1  1 


tw,  >  w 


2” 


.,Wk> 


0  €  ft  >  ^  &2 


In  the  normal  case  it  is  well  known  (cf.  Tamhane  and  Bechhofer  (1979)) 
that  Slepian's  inequality  leads  to  better  results  than  Bonferroni's  inequality. 


Use  of  the  former  allows  to  choose  g-j  and  according  to  the  condition 
6^2  =  Pf  which  is  preferable  since  e1  +  e2  -  1  <  6^  for  0  <  <  1. 

To  meet  (17),  standard  techniques  from  the  theory  of  testing  hypotheses 
can  be  used.  And  (18)  can  be  attained  by  using  results  of  single-stage 
selection  procedures  in  the  indifference  zone  approach  due  to  Bechhofer  (1954). 
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